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Granultella tundricola strain MP5ACTX9 1 is a novel species of the genus Granulicella in 
subdivision 1 Acidobacteria. G. tundricola is a predominant member of soil bacterial 
communities, active at low temperatures and nutrient limiting conditions in Arctic alpine 
tundra. The organism is a cold-adapted acidophile and a versatile heterotroph that hydro- 
lyzes a suite of sugars and complex polysaccharides. Genome analysis revealed metabolic 
versatility with genes involved in metabolism and transport of carbohydrates, including 
gene modules encoding for the carbohydrate-active enzyme (CAZy) families for the break- 
down, utilization and biosynthesis of diverse structural and storage polysaccharides such as 
plant based carbon polymers. The genome of G. tundrtola strain MP5ACTX9 1 consists of 
4,309,151 bp of a circular chromosome and five mega plasmids with a total genome con- 
tent of 5,503,984 bp. The genome comprises 4,705 protein-coding genes and 52 RNA 
genes. 



Introduction 

The strain MP5ACTX9 T (=ATCC BAA-1859T =DSM 
23138 T ) is the type strain of Granulicella 
tundricola [tun.dri.co'la. N.L. n. tundra, tundra, a 
cold treeless region; L. masc. suffix -cola (from L. n. 
incola) dweller; N.L. n. tundricola tundra dweller] 
that was isolated from soil at the Malla Nature Re- 
serve, Kilpisjarvi, Finland; 69°0TN, 20°50'E) and 
described along with other species of the genus 
Granulicella isolated from tundra soil [1]. 

Acidobacteria is a phylogenetically and physiolog- 
ically diverse phylum [2,3], the members of which 
are ubiquitously found in diverse habitats and are 
abundant in most soil environments [4,5] includ- 
ing Arctic tundra soils [6,7]. Acidobacteria are rel- 



atively difficult to cultivate, as they have slow 
growth rates. To date only subdivisions 1, 3, 4, 8, 
10 and 23 Acidobacteria are defined by taxonomi- 
cally characterized representatives [8-23] as well 
as three 'Candidates' taxa [24,25]. The phyloge- 
netic diversity, ubiquity and abundance of this 
group suggest that they play important ecological 
roles in soils. The abundance of Acidobacteria cor- 
relates with soil pH [26,27] and carbon [28,29], 
with subdivision 1 Acidobacteria being most 
abundant in slightly acidic soils. Acidobacteria, 
including members of the genera Granulicella and 
Terriglobus, dominate the acidic tundra heaths of 
northern Finland [26,30-32]. Using selective 
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isolation techniques we have been able to isolate 
several slow growing and fastidious strains of 
Acidobacteria [1,11]. On the basis of phylogenetic, 
phenotypic and chemotaxonomic data, including 
16S rRNA, rpoB gene sequence similarity and 
DNA-DNA hybridization, strain MP5ACTX9 T was 
classified as a novel species of the genus 
Granulicella [1]. Here, we summarize the physio- 
logical features together with the complete ge- 
nome sequence, annotation and data analysis of 
Granulicella tundricola strain MP5ACTX9 T . 

Classification and features 

Within the genus Granulicella, eight species are 
described with validly published names: G. 
mallensis MP5ACTX8 T ,G\ tundricola MP5ACTX9 T , G. 
arctica MP5ACTX2t,G. sapmiensis S6CTX5A 1 iso- 
lated from Arctic tundra soil [1] and G. paludicola 



OB1010 T , G. paludicola LCBR1, G. pectinivorans 
TPB6011T ,G. rosea TP01014T ,G. aggregans 
TPB6028 T isolated from sphagnum peat bogs [2]. 

Strain MP5ACTX9 T shares 95.5 - 97.2% 16S rRNA 
gene identity with tundra soil strains G. mallensis 
MP5ACTX8T (95.5%), G. arctica MP5ACTX2 T 
(96.9%), G. sapmiensis S6CTX5A T (97.2%) and 
95.2 - 97.7% identity with the sphagnum bog 
strains, G. pectinivorans TPB6011 T (97.7%), G. 
rosea TP01014T (97.2%), %), G. aggregans 
TPB6028T (96.8%), G. paludicola LCBR1 (95.9%), 
and G. paludicola strain OBIOIOt (95.3%), which 
were isolated from sphagnum peat. Phylogenetic 
analysis based on the 16S rRNA gene of taxonomi- 
cally classified strains of family Acidobacteriaceae 
placed G. rosea type strain T4 T (AM887759) as the 
closest taxonomically classified relative of G. 
tundricola strain MP5ACTX9 T (Table 1, Figure 1). 
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Acidicapsa borealis KA1 T (FR774763) 

Acidicapsa ligni WH120 T (EU780204) 

Figure 1. Phylogenetic tree highlighting the position of C. tundricola MP5ACTX9 1 (shown in bold) relative to the 
other type strains within subdivisionl Acidobacteria. The maximum likelihood tree was inferred from 1,361 aligned 
positions of the 1 6S rRNA gene sequences and derived based on the Tamura-Nei model using MEGA 5 [42]. Boot- 
strap values >50 (expressed as percentages of 1,000 replicates) are shown at branch points. Bar: 0.01 substitutions 
per nucleotide position. The corresponding GenBank accession numbers are displayed in parentheses. Strains 
whose genomes have been sequenced, are marked with an asterisk; G. mallensis MP5ACTX8 1 (CP003130), G. 
tundricola MP5ACTX9 1 (CP002480), T. saanensis SP1PR4 T (CP002467), T. roseus KBS63 T (CP003379), and A. 
capsulatum ATCC 51 1 96 T (CP001 472). Bryobacter aggregatus MPL3 (AM162405) in SD3 Acidobacteria was used 
as an outgroup. 
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Table 1. Classification and general features of C. tundricola strain MP5ACTX9 1 



MIGS ID Property 



Term 



Evidence code 



Classification 



Gram stain 
Cell shape 
Motility 
Sporulation 
Temperature range 
Optimum temperature 
pH range; Optimum 

Carbon source 

MIGS-6 Habitat 

MIGS-6.3 Salinity 

MIGS-22 Oxygen requirement 

MIGS-15 Biotic relationship 

MIGS-14 Pathogenicity 

MIGS-4 Geographic location 

MIGS-5 Sample collection 

MIGS-4.1 Latitude 

MIGS-4 .2 Longitude 

MIGS-4.4 Altitude 



Domain Bacteria TAS 

Phylum Acidobacteria TAS 

Class Acidobacteria TAS 

Order Acidobacteriales TAS 

Family Acidobacteriaceae TAS 

Genus Cranulicella TAS 

Species Cranulicella tundricola TAS 

Type strain: MP5ACTX9 (ATCC BAA-1859 = DSM 23138) 

negative TAS 

rod TAS 

non-motile TAS 

not reported NAS 

4-2 8°C TAS 

21-24 °C TAS 

3.5-6.5; 5 TAS 

D-glucose, maltose, cellobiose, D-fructose, D-galactose, 

lactose, lactulose, D-mannose, sucrose, trehalose, D-xylose, TAS 

raffinose, N-acetyl-D-glucosamine, glutamate 

terrestrial, tundra soil TAS 

No growth with >1 .0% NaCl (w/v) TAS 

aerobic TAS 

free-living TAS 

non-pathogen NAS 

Malla Nature Reserve, Arctic-alpine tundra, Finland TAS 

2006 TAS 

69°01'N TAS 

20°50'E TAS 

700 m TAS 



33] 

34,35] 

36,37] 

37,38] 

35,39] 

1,40] 

1] 



a Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the 
literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based 
on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene 
Ontology project [41]. 
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Morphology and physiology 

G. tundricola cells are Gram- negative, non-motile, 
aerobic rods, approximately 0.5 |im wide and 0.5 - 
1.8 |im long. Colonies on R2A agar are pink, circu- 
lar, convex and smooth. Growth occurs at +4 to 
28°C and at pH 3.5-6.5 with an optimum at 21-24°C 
and pH 5 (Fig. 2). Genotypic analyses, including low 
rpoB gene sequence similarity and phenotypic 
characteristics clearly distinguished strain 
MP5ACTX9 T from other Granulicella spe- 
cies/strains, leading us to conclude that 
MP5ACTX9 T represents a novel species of the genus 
Granulicella, for which the name Granulicella 
tundricola sp. nov. was proposed [1]. 

Strain MP5ACTX9 T hydrolyzed complex to simple 
carbon substrates [1] which include complex poly- 
saccharides like aesculin, pectin, laminarin, starch 
and pullulan, but not gelatin, cellulose, lichenan, 
sodium alginate, xylan, chitosan or chitin. Strain 
MP5ACTX9 T also utilized the following sugars as 
growth substrates: D-glucose, maltose, cellobiose, 
D-fructose, D-galactose, lactose, lactulose, D- 
mannose, sucrose, trehalose, D-xylose, raffinose, N- 
acetyl-D-glucosamine, glutamate and gluconic acid. 
Enzyme activities reported for the strain 
MP5ACTX9 T include acid phosphatase, esterase (C4 
and C8), leucine arylamidase, valine arylamidase, 
a-chymotrypsin, trypsin, naphthol-AS-BI- 
phosphohydrolase, a- and B-galactosidases, a- and 
6-glucosidases, N-acetyl- 6-glucosaminidase, 8- 
glucuronidase, a-fucosidase and a-mannosidase 
but negative for alkaline phosphatase and lipase 



(C14). Strain MP5ACTX9 T is resistant to ampicillin, 
erythromycin, chloramphenicol, neomycin, strep- 
tomycin, tetracycline, gentamicin, bacitracin, 
polymyxin B and penicillin, but susceptible to ri- 
fampicin, kanamycin, lincomycin and novobiocin. 

Chemotaxonomy 

The major cellular fatty acids in G. tundricola are 
iso-Ci5:o (46.4%), Ci6:i U 7c (35.0%) and Ci 6: o (6.6%). 
The cellular fatty acid composition of strain 
MP5ACTX9 T was similar to that of other 
Granulicella strains with fatty acids iso-Ci5:o and 
Ci6:i M 7c being most abundant in all strains. Strain 
MP5ACTX9 T contains MK-8 as the major quinone 
and also contains 4% of MK-7. 

Genome sequencing and annotation 

Genome project history 

G. tundricola strain MP5ACTX9 T was selected for 
sequencing in 2009 by the DOE Joint Genome Insti- 
tute QGI) community sequencing program. The 
Quality Draft (QD) assembly and annotation were 
completed on May 24, 2010. The GenBank Date of 
Release was February 2, 2011. The genome project 
is deposited in the Genomes On-Line Database 
(GOLD) [43] and the complete genome sequence of 
strain MP5ACTX9 T is deposited in GenBank 
(CP002480.1). Table 2 presents the project infor- 
mation and its association with MIGS version 2.0 
[44]. 
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Table 2. Project information 



MIGS ID 


Property 


Term 


K A 1 /~^C 0 1 

M I (ub 3 I 


Finishing quality 


Finished 

Three libraries, an lllumina GAii shotgun library (GUIX), 


K A I r^c 1 o 


Libraries used 


a 454 Titanium standard library (GTWG, GWTA) and a paired end 454 
fnSl INI lihrarv 


MIGS 29 


Sequencing platforms 


454 Titanium standard, 454 Paired End, lllumina 


MIGS 31.2 


Fold coverage 


20x(454), 2 74X (lllumina) 


MIGS 30 


Assemblers 


Newbler, VELVET, PHRAP 


MIGS 32 


Gene calling method 


ProdigaL, GenePRIMP 




Locus Tag 


AciX9 




Genbank ID 


CP002 480.1 




GenBank Date of Release 


February 2, 201 1 




GOLD ID 


Gc01833 




BIOPROJECT 


PRJNA50551, PRJNA47621 




Project relevance 


Environmental, Biogeochemical cycling of Carbon, Biotechnological, GEBA 



Growth conditions and genomic DNA extraction 

G. tundricola MP5ACTX9 T was cultivated on R2 
medium as previously described [1]. Genomic 
DNA (gDNA) of high sequencing quality was iso- 
lated using a modified CTAB method and evaluat- 
ed according to the Quality Control (QC) guide- 
lines provided by the DOE Joint Genome Institute 
[45]. 

Genome sequencing and assembly 

The finished genome of G. tundricola MP5ACTX9 1 
QGI ID 4088693) was generated at the DOE Joint 
genome Institute (JGI) using a combination of 
lllumina [46] and 454 technologies [47]. For this 
genome we constructed and sequenced an 
lllumina GAii shotgun library which generated 
42,620,699 reads totaling 3239 Mb, a 454 Titani- 
um standard library which generated 146,119 
reads and three paired end 454 libraries with an 
average insert size of 9.3 kb which generated 
178,757 reads totaling 154.3 Mb of 454 data. All 
general aspects of library construction and se- 
quencing performed at the JGI can be found at the 
JGI website [45]. The 454 Titanium standard data 
and the 454 paired end data were assembled with 
Newbler, version 2.3. lllumina sequencing data 
was assembled with Velvet, version 0.7.63 [48]. 
The 454 Newbler consensus shreds, the lllumina 
Velvet consensus shreds and the read pairs in the 
454 paired end library were integrated using par- 
allel phrap, version SPS - 4.24 (High Performance 
Software, LLC) [49]. The software Consed [50] 
was used in the finishing process. The 
Phred/Phrap/Consed software package [51] was 
used for sequence assembly and quality 



assessment in the subsequent finishing process, 
lllumina data was used to correct potential base 
errors and increase consensus quality using the 
software Polisher developed at JGI (Alia Lapidus, 
unpublished). Possible misassemblies were cor- 
rected using gapResolution (Cliff Han, un- 
published), Dupfinisher [52] or sequencing cloned 
bridging PCR fragments with sub-cloning. Gaps 
between contigs were closed by editing in Consed, 
by PCR and by Bubble PCR (J-F Cheng, un- 
published) primer walks. The final assembly is 
based on 29.1 Mb of 454 draft data which pro- 
vides an average 20* coverage of the genome and 
975 Mb of lllumina draft data which provides an 
average 274* coverage of the genome. 

Genome annotation 

Genes were identified using Prodigal [53] as part 
of the Oak Ridge National Laboratory genome an- 
notation pipeline, followed by a round of manual 
curation using the JGI GenePRIMP pipeline [54]. 
The predicted CDSs were translated and used to 
search the National Center for Biotechnology In- 
formation (NCBI) non-redundant database, 
UniProt, TIGRFam, Pfam, PRIAM, KEGG, (COGs) 
[55,56], and InterPro. These data sources were 
combined to assert a product description for each 
predicted protein. Non-coding genes and miscel- 
laneous features were predicted using tRNAscan- 
SE [57], RNAMMer [58], Rfam [59], TMHMM [60], 
and signalP [61]. Additional gene prediction anal- 
ysis and functional annotation were performed 
within the Integrated Microbial Genomes Expert 
Review (IMG-ER) platform [62]. 
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Genome properties 

The genome is 5,503,984 bp in size, which in- 
cludes the 4,309,151 bp chromosome and five 
plasmids pACIX901 (0.48 Mbp); pACIX902 (0.3 
Mbp); pACIX903 (0.19 Mbp), pACIX904 (0.12 
Mbp) and pACIX905 (0.12 Mbp), with a GC content 
of 59.9 mol%. There are 52 RNA genes (Figures 3 



and 4, and Table 3). Of the 4,758 predicted genes, 
4,706 are protein-coding genes (CDSs) and 163 
are pseudogenes. Of the total CDSs, 68.8% repre- 
sent COG functional categories and 27.5% consist 
of signal peptides. The distribution of genes into 
COG functional categories is presented in Figure 3 
and Table 4, and Table 5. 



Table 3. Summary of genome: one chromosome and five plasmids 



Label 


Size (Mb) Topology 


INSDC identifier 


RefSeq ID 


Chromosome 


4.3 circular 


CP002 480.1 


NC_01 5064.1 


PlasmidpACIX901 


0.48 circular 


CP002 481.1 


NC_01 5057.1 


PlasmidpACIX902 


0.3 circular 


CP002 482.1 


NC_01 5065.1 


PlasmidpACIX903 


0.19 circular 


CP002 483.1 


NC_01 5058.1 


PlasmidpACIX904 


0.12 circular 


CP002 484.1 


NC_01 5059.1 


PlasmidpACIX905 


0.12 circular 


CP002 485.1 


NC_01 5060.1 



Table 4. Genome statistics. 



Attribute 


Value % 


of Total 


Genome size (bp) 


5,503,984 


100 


DNA coding (bp) 


4,759,459 


86.5 


DNA G+C (bp) 


3,301,098 


60.0 


DNA scaffolds 


6 


100 


Total genes 


4,757 


100 


Protein coding genes 


4,705 


98.9 


RNA genes 


52 


1.1 


Pseudo genes 


163 


3.4 


Genes in internal clusters 


2,395 


50.4 


Genes with function prediction 


2,936 


61.7 


Genes assigned to COGs 


3,2 59 


68.5 


Genes with Pfam domains 


3,504 


73.6 


Genes with signal peptides 


652 


13.7 


Genes with transmembrane helices 


1,108 


23.3 


CRISPR repeats 


0 





The total is based on either the size of the genome in base 
pairs or the protein coding genes in the annotated genome. 
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Figure 3. Circular representation of the chromosome of C. tundricola MP5ACTX9 7 displaying relevant genome fea- 
tures. From outside to center; Genes on forward strand (colored by COG categories), genes on reverse strand (col- 
ored by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content and GC skew. 
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tures. From outside to center; Genes on forward strand (color by COG categories), genes on reverse strand (color 
by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content and GC skew. Order 
and size from left to right: pACIX901, 0.48 Mbp; pACIX902, 0.3 Mbp; pACIX903, 0.19 Mbp; pACIX904, 0.12 
Mbp; pACIX905, 0.12 Mbp. 



Table 5. Number of genes associated with general COG functional categories. 
Code Value %age Description 



1 


1 Afl 
I OU 


A AK 


iransiauun, nuosomai siruciure anu uiogenesis 


A 

r\ 


7 

z 


n dfx 

U.UD 


l\IN/\ W [ (JCtraSI 1 It d 1 IU 1 MULII 1 ILdU UI 1 


1/ 
l\ 


1 AQ 


D. j D 


i raiiscnpLiuii 


1 

L 


1 11 
z zz 


O. 1 O 


1? o in ir - 3 ti i^n rprnmn i n^ifinn ^nH ronair 
l\t!IJ 1 ILd LILM 1, I CLUI I IU 1 1 ld.ll Ul 1 dllU Itrpdll 


D 
D 


1 
I 


U.Uj 


Chromatin structure and dynamics 


n 


J J 


o Q9 

U. zJZ. 


L-cii cycie comroi, v_,eii uivisiun, ciiruiiiusuiiie parLiLiuiiing 


V 


Aft 

DO 


1 8Q 
I . or/ 


L/eiense iiieciiaiiisiiis 


1 


z 1 z 


K Q 


Signal transduction mechanisms 


M 


287 


7.98 


Cell wall/membrane biogenesis 


N 


73 


2.03 


Cell motility 


U 


123 


3.42 


1 ntracel lular traff icki ng and secretion 


O 


125 


3.48 


Posttranslational modification, protein turnover, chaperones 


c 


174 


4.84 


Energy production and conversion 


G 


248 


6.9 


Carbohydrate transport and metabolism 


E 


234 


6.51 


Amino acid transport and metabolism 


F 


68 


1.89 


Nucleotide transport and metabolism 


H 


147 


4.09 


Coenzyme transport and metabolism 


1 


126 


3.5 


Lipid transport and metabolism 


P 


137 


3.81 


Inorganic ion transport and metabolism 


Q 


91 


2.53 


Secondary metabolites biosynthesis, transport and catabolism 


R 


446 


12.41 


General function prediction only 


S 


370 


10.29 


Function unknown 




1498 


31.49 


Not in COGs 



The total is based on the total number of protein coding genes in the genome. 
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Discussion 

Granulicella tundricola MP5ACTX9 T is a tundra soil 
strain with a genome consisting of a circular 
chromosome and five mega plasmids ranging in 
size from 1.1 x 10 5 to 4.7 x 10 5 bp for a total ge- 
nome size of 5.5 Mbp. The G. tundricola genome 
also contains close to twice as many pseudogenes 
and a large number of mobile genetic elements as 
compared to Granulicella mallensis and Terrigobus 
saanensis, two other Acidobacteria isolated from 
the same habitat [29]. A large number of genes 
assigned to COG functional categories for 
transport and metabolism of carbohydrates 
(6.9%) and amino acids (6.5%) and involved in 
cell envelope biogenesis (8%) and transcription 
(6.9%) were identified. Further genome analysis 
revealed an abundance of gene modules encoding 
for functional activities within the carbohydrate- 
active enzymes (CAZy) families [63,64] involved 
in breakdown, utilization and biosynthesis of car- 
bohydrates. G. tundricola hydrolyzed complex 
carbon polymers, including CMC, pectin, lichenin, 
laminarin and starch, and utilized sugars such as 
cellobiose, D-mannose, D-xylose and D-trehalose. 
Genome predictions for CDSs encoding for 
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